5 research outputs found

    Acquisition and Declarative Analytical Processing of Spatio-Temporal Observation Data

    Get PDF
    A generic framework for spatio-temporal observation data acquisition and declarative analytical processing has been designed and implemented in this Thesis. The main contributions of this Thesis may be summarized as follows: 1) generalization of a data acquisition and dissemination server, with great applicability in many scientific and industrial domains, providing flexibility in the incorporation of different technologies for data acquisition, data persistence and data dissemination, 2) definition of a new hybrid logical-functional paradigm to formalize a novel data model for the integrated management of entity and sampled data, 3) definition of a novel spatio-temporal declarative data analysis language for the previous data model, 4) definition of a data warehouse data model supporting observation data semantics, including application of the above language to the declarative definition of observation processes executed during observation data load, and 5) column-oriented parallel and distributed implementation of the spatial analysis declarative language. The huge amount of data to be processed forces the exploitation of current multi-core hardware architectures and multi-node cluster infrastructures

    Study and comparison of different Machine Learning-based approaches to solve the inverse problem in Electrical Impedance Tomographies

    Get PDF
    Electrical Impedance Tomography (EIT) is a non-invasive technique used to obtain the electrical internal conductivity distribution from the interior of bodies. This is a promising method from the manufacturing viewpoint, since it could be used to estimate different physical inner body properties during the production of goods. Nevertheless, this technique requires dealing with an inverse problem that makes its usage in real-time processes challenging. Recently, Machine Learning techniques have been proposed to solve the inverse problem accurately. However, the majority of prior research is focused on qualitative results, and they typically lack a systematic methodology to determine the optimal hyperparameters appropriately. This work presents a systematic comparison of six popular Machine Learning algorithms: Artificial Neural Network, Random Forest, K-Nearest Neighbors, Elastic Net, Ada Boost, and Gradient Boosting. Particularly, the last two algorithms were based on decision tree learners. Furthermore, we studied the relationship between model performance and different EIT configurations. Specifically, we analyzed whether the measurement pattern and the number of used electrodes could increase the model performance. Experiments revealed that tree-based models present high performance, even better than Neural Networks, the most widely-used Machine Learning model to deal with EIT. Experiments also showed a model performance improvement when the EIT configuration was optimized. Most favorable metrics were attained using the tree-based Gradient Boosting model with a combination of both adjacent and mono measurement patterns as well as with 32 electrodes deployed during the tomographic process. With this particular setting, we achieved an accuracy of 99.14% detecting internal artifacts and a Root Mean Square Error of 4.75 predicting internal conductivity distributionsThis work has received financial support from the Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/04) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS - Centro Singular de Investigación en Tecnoloxías Intelixentes da Universidade de Santiago de Compostela as a Research Center of the Galician University System. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureS

    Smart Environmental Data Infrastructures: Bridging the Gap between Earth Sciences and Citizens

    Get PDF
    The monitoring and forecasting of environmental conditions is a task to which much effort and resources are devoted by the scientific community and relevant authorities. Representative examples arise in meteorology, oceanography, and environmental engineering. As a consequence, high volumes of data are generated, which include data generated by earth observation systems and different kinds of models. Specific data models, formats, vocabularies and data access infrastructures have been developed and are currently being used by the scientific community. Due to this, discovering, accessing and analyzing environmental datasets requires very specific skills, which is an important barrier for their reuse in many other application domains. This paper reviews earth science data representation and access standards and technologies, and identifies the main challenges to overcome in order to enable their integration in semantic open data infrastructures. This would allow non-scientific information technology practitioners to devise new end-user solutions for citizen problems in new application domainsThis research was co-funded by (i) the TRAFAIR project (2017-EU-IA-0167), co-financed by the Connecting Europe Facility of the European Union, (ii) the RADAR-ON-RAIA project (0461_RADAR_ON_RAIA_1_E) co-financed by the European Regional Development Fund (ERDF) through the Iterreg V-A Spain-Portugal program (POCTEP) 2014-2020, and (iii) the Consellería de Educación, Universidade e Formación Profesional of the regional government of Galicia (Spain), through the support for research groups with growth potential (ED431B 2018/28)S

    A survey on machine learning in array databases

    No full text
    This paper provides an in-depth survey on the integration of machine learning and array databases. First,machine learning support in modern database management systems is introduced. From straightforward implementations of linear algebra operations in SQL to machine learning capabilities of specialized database managers designed to process specific types of data, a number of different approaches are overviewed. Then, the paper covers the database features already implemented in current machine learning systems. Features such as rewriting, compression, and caching allow users to implement more efficient machine learning applications. The underlying linear algebra computations in some of the most used machine learning algorithms are studied in order to determine which linear algebra operations should be efficiently implemented by array databases. An exhaustive overview of array data and relevant array database managers is also provided. Those database features that have been proven of special importance for efficient execution of machine learning algorithms are analyzed in detail for each relevant array database management system. Finally, current state of array databases capabilities for machine learning implementation is shown through two example implementations in Rasdaman and SciDBOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer NatureS
    corecore